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Many high dimensional integrals can be reduced to the problem of 
finding the relative measures of two sets. Often one set will be exponen- 
tially larger than the other, making it difficult to compare the sizes. A 
standard method of dealing with this problem is to interpolate between 
the sets with a sequence of nested sets where neighboring sets have relative 
measures bounded above by a constant. Choosing such a well balanced 
sequence can be very difficult in practice. Here a new approach that au- 
tomatically creates such sets is presented. These well balanced sets allow 
for faster approximation algorithms for integrals and sums, and better 
tempering and annealing Markov chains for generating random samples. 
Applications such as finding the partition function of the Ising model and 
normalizing constants for posterior distributions in Bayesian methods are 
discussed. 

1 Introduction 

Monte Carlo methods for numerical integration can have enormous variance for 
the types of high dimensional problems that arise in statistics and combinatorial 
optimization applications. Consider a state space $7 with measure /x, and B C 
with finite measure. Then the problem considered here is approximating 



The classical Monte Carlo approach is to create a random variable X such 
that E{X) = Z where X has variance as small as possible. Unfortunately, it is 
often not possible to know the variance of X ahead of time, and this must be 
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estimated as well. How good the estimate of the variance is depends on even 
higher moments which are even more difficult to estimate. 

The method presented here creates an estimate of Z of the form e^l^ , where 
fc is a known constant and X is a Poisson random variable with mean khi{Z). 
Because the mean and variance for a Poisson random variable are the same, we 
simultaneously obtain our estimate of Z and knowledge of the variance of our 
estimate. 

In fact, the output from our method does the following: 

• Estimate Z to within a specified relative error with a specified failure 
probability in time 0(ln(Z)^). 

• Create a well balanced sequence of nested sets useful in building annealing 
and tempering Markov chains that can be used to generate Monte Carlo 
samples. 

• Develop an omnithermal approximation for partition functions arising 
from spatial point processes and Gibbs distributions. 

Previous work The new method presented here follows a long line of work 
using interpolating sets. For instance, Valleau and Card [18] introduced what 
they called multistage sampling where an intermediate distribution was added to 
make estimation more effective. Jerrum, Valiant and Vazirani [8] used a similar 
idea of self-reducibility, and carefully analyzed the computational complexity of 
the resulting approximation method. 

Suppose we are given two finite sets B' and B such that B' C B and ^B (the 
number of elements of B), is known. One way of viewing self-reducibility, is that 
it effectively requires a sequence of sets B = Bq D Bi D B2 D ■ ■ ■ D Bg = B' 
such that the relative sizes of the sets ^Bi+i/^Bi > a for a fixed constant 
a S (0, 1). Then an unbiased estimate bi of is created for each i. 

The product of these estimates will then be an unbiased estimator for ^B' /^B, 
and multiplying by gives the final estimate of i^B. 

For fixed a G (0, 1), it is easy to estimate i^Bi^i/^Bi with small relative 
error simply by drawing samples from H^Bi and counting the percentage that 
fall in jj=Bi^i. The relative standard deviation of a Bernoulli random variable 
with parameter a is (1 — a) /a, so it is important not to make a too small. On 
the other hand, if a is too large, then the nested sets are not shrinking much 
at each step, and it will require a lengthy sequence of such sets. To be precise, 
the number of sets i must satisfy i > ln„(#B'/#B) = ln(#B/#B')/ In(a-i) 
which goes to infinity as a goes to 1. Balancing these two considerations leads 
to a optimal a value of about 0.2031. 

The difficulty in applying self-reducibility is finding a sequence of sets such 
that ^Bi+i/^Bi is provably at least a, but not so close to 1 that the sequence 
of sets is too long. Ideally, ^Bi+i/^Bi would equal a for every i, or at least 
be very close. For fixed constants ai and a2, refer to a sequence of sets where 
the ratios ^Bi^i/^Bi fall in [q;i,q;2] for all i as well-balanced. 
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Well-balanced sequences have other uses as well. Methods of designing 
Markov chains such as simulated annealing [10], simulated tempering, and par- 
allel tempering [16, 5, 11] all require such a sequence of well-balanced sets in 
order to mix rapidly (see [19, 20].) 

Now consider the special case of (1.1) where Z is the normalizing constant 
of a posterior distribution of a Bayesian analysis. Skilling [15] introduced nested 
sampling as a way of generating a random sequence of nested sets. The advan- 
tage this method has over self-reducibility is that there is no need to have the 
sequence of sets in hand ahead of time. Instead, it builds up sets from scratch 
at random according to a well-defined procedure. 

The disadvantage is that it loses the property of self-reducible algorithms 
that the variance of the output could be bounded prior to running the algo- 
rithm. Because deterministic numerical integration was used in the method, 
the variance can be determined only up to a factor that depends upon the 
derivatives of a function that is difficult to compute. Therefore, nested sam- 
pling falls in the class of methods where the variance must be estimated, rather 
than bounded ahead of time as with self-reducibility. 

The Tootsie Pop Algorithm The method presented here is called The Toot- 
sie Pop Algorithm (TPA), and combines the tight analysis of self-reducibility 
by adding features similar to nested sampling. Like self-reducibility, it is very 
general, working over a wide variety of problems. This includes the nested sam- 
pling domain of Bayesian posterior normalization, but also includes many other 
problems where self-reducibility has been applied such as the Ising model. Por- 
tions of this work were presented at the Ninth Valencia International Meetings 
on Bayesian Statistics, and also appears in the conference proceedings [7] with 
a discussion. 

The name is somewhat unusual, and references an advertising campaign run 
for Tootsie Pop candies. A Tootsie Pop is a chocolate chewy center surrounded 
by a candy shell. The ad campaign asked "How many licks does it take to get to 
the center of a Tootsie Pop?" . Our algorithm operates in a similar fashion. Our 
set B is slowly whittled away until the center B' is reached. The number of steps 
taken to move from B to B' will be Poisson with mean \n{fi{B) / fi{B')), thereby 
allowing approximation of ijl{B) / ijl{B'). Therefore, the "number of licks" is 
exactly what is needed to form our estimate! 

1.1 Organization 

Section 2 describes the TPA procedure in detail, then Section 3 shows some 
applications. Section 4 then analyzes the expected running time of the method, 
and introduces a two phase approach to TPA. Section 5 describes how TPA 
can be used to build well-balanced nested sets for tempering. Section 6 shows 
how to create an approximation that simultaneously works for all members of 
a continuous family of sets at once. Finally, Section 7 discusses further areas of 
exploration with TPA techniques. 
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2 The Tootsie Pop Algorithm 



The TPA method has four general ingredients: 

1. A measure space {fl^J-,fi) 

2. Two finite measurable sets B and B' satisfying B' C B and > 0. 
The set B' is the center and B is the shell. 

3. A family of nested sets {A{(3) : /3 e R U {oo}} such that (3' < f3 implies 
A{l3') C A{(3), is a continuous function of /3, and the limit of 
fi{A{f3)) as (3 goes to — oo is 0. 

4. Special values /3b and that satisfy A{/3b) = B and A{(3b') = B' . 
With these ingredients, the TPA method is very simple to describe. 

1. Start with i — and — Pb- 

2. Draw a random sample Y from conditioned to lie in A{l3i). 

3. Let /3,+i = inf{/3 : Y G v4(^)}. 

4. If y e B' stop and output i. 

5. Else set i to be i + 1 and go back to step 2. 

Another way of describing the draw in Step 2 is that for measurable Z?, P{Y g 
D) = ^{Dr\A{l3i)) / ^{A{l3i)). At each step, the set A{l3i) shrinks with probability 
1, and so is slowly worn away until the sample falls into the region B' . 

Line 2 above deserves special attention. Drawing a random sample Y from 
conditioned to lie in A{l3i) is in general a very difficult problem. The good 
news is that the importance of this problem means that a vast literature for 
solving this problem exists. Markov chain Monte Carlo (MCMC) methods are 
critical to obtaining these samples, and variations on the early methods have 
blossomed over the last fifty years. Readers are referred to [14, 13, 3] and the 
references therein for more information. 

Of course, any other method for turning samples into approximations either 
implicitly or explicitly depend on the ability to execute some variant of line 2 
as well, so our algorithm is not actually demanding anything above and beyond 
what others require. The algorithm is easily modified to handle different meth- 
ods of simulating random variables. For instance, nested sampling [15] draws 
several such Y variables at once, and TPA can be written to do so as well. 

The key fact about this process is the following: 

Theorem 2.1. At any step of the algorithm, let 

= ln(A.(A(A)))-ln(M^(A+i))). 

Then the Ei are independent, identically distributed exponential random vari- 
ables with mean 1. 
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Proof. To simplify the notation, let m{b) = Begin by showing that 

each Ui = m{[3i+\) / m{j3i) is uniform over [0, 1]. Fix /3j > j^B' and let a G (0, 1). 
Then since m{b) is a continuous function in b with limf,_,._oo m(6) = 0, there 
exists a 6 G (— oo, Pi] such that ■m{h) / m.{Bi) = a. Call this value 13a- 

Let < e < m{j3B) — a- Then by the same reasoning there is a value Pa+e ^ 
(3b such that m{l3a+e) / t-^iPi) = a + e. Now consider y drawn from /x conditioned 
to lie in Then ft+i = inf{6 : Y e A{b)}. 

Moreover, {Ui < a} ^ {Y & A{l3a)}, an event which occurs with probability 
m{pa)/m{pi) = a. So P{Ui < a) > a. 

On the other hand, Y ^ A{J3a+f,) implies /3i+i > Pa+e which means that f/, = 
m{l3i+i)/m{l3i) > a + e. In other words, P{Ui < a + e) < P{Y e A{/3a+e}) = 
o+e. This holds for e arbitrarily small, hence by dominated convergence P(f7, < 
a) < a. Therefore, P{Ui < a) is at least and at most a, so P(f7, < a) = a, which 
shows that Ui is uniform on [0, 1]. 

Observing that the negative of the natural log of a uniform number of [0, 1] 
is an exponential with mean 1 completes the proof. □ 

For t{b) = ln(/i(^(6))), the theorem says the points t{f3o),t{f3i), . . . ,t{/3i) in 
a run of TPA are separated by exponential random variables of mean 1, in other 
words, these points form a homogeneous Poisson point process on [t{PB'),t{l3B)] 
of rate 1. 

2.1 Taking advantage of Poisson point processes 

The first application of this is to describe the total number of points used by 
a run of TPA, that is, the value of i at the end of the algorithm. Because the 
t{l3i) values form a Poisson point process, the distribution of i is Poisson with 

mean tiPe) - t{l3B') = ln(/x(S)/^(B'))- 

Furthermore, the union of k independent Poisson point processes of rate 
1 is also a Poisson point process of rate k. That means that after k runs of 
TPA, the distribution of the total number of samples used is Poisson with mean 

3 Applications 

The following examples illustrate some of the uses for TPA. 
3.1 The Ising model 

The Ising model is an example of a Gibbs distribution, where a function H : 
ri — )■ R gives rise to a distribution on f2: 

Prom applications in statistical physics, /3 > is known as the inverse temper- 
ature, and Z{l3) is called the partition function. 
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In the Ising model, each node of a graph G = {V, E) is assigned one of two 
values. There are many ways to represent the model. In the form considered 
here, each node is either or 1, and for x G {0,1}^, —H{x) is one plus the 
number of edges e E E such that the cndpoints of the edge have the same value 
in a;. This can be written as i?(a;) = — + — a;(j) — a;(j) + 2a;(i)a;(j))]. 

In order to embed this problem in the framework of TPA, add an auxiliary 
dimension to the configuration x. The auxiliary state space is 

fiaux(/3) = {ix,y) : x e {0,lV,y^ [0,eM-PH{x))}. 

Some notes on fiaux(/3): 

• The total length of the line segments in riaux(/3) is just Z{I3). That is to 
say, /x(r2aux(/3)) = where /j. is the one dimensional Lebesgue measure 
of the union of the line segments. 

• Let /3' < 13. Then since -H{x) > 0, rJaux(;3') C n^uAj3). Moreover, Z(/3) 
is a continuous function that goes to as /3 — )■ — oo. Therefore Condition 
2 of the TPA ingredients is satisfied. 

• For /3 = 0, y e [0, 1] for all x G {0, 1}. That means Z(0) = 2^. 

• Let (3 > 0. Then Oaux(/3) is the shell, and Oaux(O) is the center. 
With this in mind, the TPA algorithm works as follows. 

1. Start with 2 = and /Si = /3. 

2. Draw a random sample X from tt/s- , then draw Y (given X) uniformly 
from [0,exp(-^ii?(X))]. 

3. Let /3,+i = HY)/{^HiX)) 

4. If l3i+i < stop and output i. 

5. Else set i to be z + 1 and go back to step 2. 

One run of TPA wih require on average 1 + \n{Z {(3) / Z {0)) = 1 + \n{Z{(3)) - 
#y ln(2) samples from various values of /3, where i^V is the number of vertices 
of the graph. 

This method of adding an auxiliary variable allows TPA to be used on a 
variety of discrete distributions by changing the measure to one that varies 
continuously in the index. 

3.2 Posterior distributions 

In Bayesian analysis, often it is necessary to find the normalizing constant of a 
posterior distribution. This is known as the evidence for a model, and can be 
written: 
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where f{x) is a nonnegative density (the product of the prior density and the 
hkehhood of the data) and ft C R". 

For a point c S O and e > 0, let (c) be the points within Li distance e of 
c. Suppose that for a particular c and e, B\{c) C 0, and there is a known M 
such that (1/2)A'/ < f{x) < M for all x G ^^(c). 

Then to estimate Z{e) — /^.g^i^^-j dx, draw iid samples Xi, . . . ,Xpf 

uniformly from B^{c), and let the estimate be Z{e) ^ (2e)"" J2i f{Xz)/N. Then 
Z{e) is an unbiased estimate for Z{e) with standard deviation bounded above 
by Z{e)/Vk. 

Now the connection to TPA can be made. The family of sets will be {A{(3) = 
Bjj{c) n f2}, and the measure is fi{A{(3)) = /^.g^^^^ f{x) dx. The shell will be 
^(oo) (so Z = /Lt(A(oo))) and the center A{e) (with measure ^(e)-) TPA can 
then be used to estimate Z/Z{e), and the estimate of Z{e) can then finish the 
job. 



4 Running time of TPA 

Suppose that TPA is run k times, and the k values of the i variable at the 
end of each run are summed together. Call this sum N. Then N has a Poisson 
distribution with mean k\n.{^{B)/ fj,{B')). This makes N/k an unbiased estimate 
of ln(^(B)/^(B'))- The variance of N/k is ln(^(B)//^(B'))/fc. 

Let be a normal random variable of mean and variance 1, and Wa be 
the inverse cdf of W so that Pr(VF < Wa) — a. Then the normal approximation 
to the Poisson gives 



iN/k)-Wa/2VN/k,iN/k) + Wa/2VN/k\ (4.1) 

as an approximately 1 — a level confidence interval for ln(/i(i?)//i(_B')). Expo- 
nentiating then gives the 1 — a level for fj.{B)/ fi{B'). 

For a specific output, it is also possible to build an exact confidence interval 
for ^{B)/fi{B') since the distribution of the output is known exactly. 

Similarly, it is easy to perform a Bayesian analysis and find a credible interval 
given a prior on h\{^{B) / ^{B')). 

Lastly, consider how to build an (e, 5) randomized approximation scheme 
(RAS) whose output A satisfies: 

^■■(<' + ''"^MM^' + ^)>'-^^ 

For simplicity, assume that ^{B)/ fi{B') > e. Note that when ii{B) / ii{B') < 
e, then simple acceptance rejection can be used to obtain an (e, (5)-RAS in 
8(e^'^ ln((5^^)) time. See [4] for a description of this method. 

The following lemma gives a bound on the tails of the Poisson distribution. 
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Lemma 4.1. Let e > and N be a Poisson random variable with mean kX, 
where e/A < 2.3. Then 



Pr 



>e <2cxp rU^T 



2A 



A 



(This result is a special case of Theorem 6.1 shown later.) 

To obtain our (e, (5)-RAS, it is sufficient to make e = ln(l + e), and to choose 
k so that 2exp(-A:?2(l - g/A)/[2A]) < S, where A = ln(/x(B)///(B')). This is 
made more difficult by the fact that A is unknown at the start of the algorithm! 

There are many ways around this difficulty, perhaps the simplest is to use a 
two phase method. First get a rough estimate of A, then refine this estimate to 
the level demanded by e. 

Phase I Let = ln(l + e) and ki = 2e-^{l - Ea)"^ \n{2S-'^). Then let Ni be 
the sum of the outputs from ki runs of TPA. 

Phase II Set fc2 — Ni{l — €a)~'^- Let N2 be the sum of the outputs from fc2 
runs of TPA. The final estimate is exp(iV2/fc2). 

Phase I estimates A to within an additive error CaA. Phase II uses the Phase 
I estimate of A to create a better estimate of A to within an additive error of ea- 
Note that Ca ~ e in the sense that limg^^o fa/e — 1- 

Theorem 4.1. The output A of the above procedure is an (e,5) randomized 
approximation scheme for 11(B)/ fi{B'). The running time is random, with an 
expected running time that is 0((ln(/i(i3)//x(i?')))^e~^ln(5~^)). 

Proof. Call Phase I a success if Ni/ki is within distance CaA of A. From 
Lemma 4.1 with e = EqA: 



Pr 



ki 



>eaX]= 2exp {-ki{Xel){l - ea)) < S/2 



since A > 1 and fci — ea^{l — ea)^^ \n{26^^). Therefore, the probability that 
Phase I is a failure is at most S/2. 

When Phase I is a success, (1 — ea)Afci < A^i. In this event k2 — Ni{l — 



Ca)-^ > Afci =Xe-\l-ea 



Pr 



N2 
k2 



X 



> EqA < 2exp 



^ ln(2(5 ^). Plugging this in to Lemma 4.1 yields: 

Xe-^{l~ea)-'H2S-')el 



2X 



X 



Using A > 1, the right hand side is at most 26. 

The chance of failure in either Phase is at most S/2 + 6/2 = S, so altogether 
1(7X^2/^2) — A| < with probability at least 1 — 6. Exponentiating then gives 



(1 



< exp(iV2/fc2)/A < e' 



l + e 



with probability at least 1 — 6. 
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The expected number of samples needed in Phase I is fciA, while the expected 
number needed in Phase II is: 

E(iV2) = E{EiN2\Ni)) = E((l - eJ-^TViA) = 2(1 - ear^^a^ ln{25~')X^. 

Since Eq — 6(e), the proof is complete. □ 

5 Well-balanced nested sets 

Consider running TPA k times, and collecting all the values of f3i generated 
during these runs. Let P denote this set of values, then P forms a Poisson point 
process of rate k on [/3b',Pb]- 

Call /3b = ao > ai > ■ ■ ■ > ae — Pb' a well-balanced cooling schedule if 
fi{A{ai+i)) / fi{A{ai)) is close to 1/e for all i from to ^ — 1. 

Given P, finding such a well-balanced set is easy: simply order the (3 values in 
P, and set ai = The value of \n{^{A(on+i) / A{ai)y) will have distribution 

equal to the sum of k iid exponential random variables with mean 1/fc. So 
\n{pL{A{ai+i) / ^i{A{ai)))) will be gamma distributed with mean 1 and standard 
deviation 1/fe. 

6 Omnithermal approximation 

Suppose instead of just a single value of interest fi{B)/^{B'), it is necessary 
to create an approximation of fj,{A{/3))/ fj,{B') that is valid for all values /? G 
[(3b',Pb] simultaneously. Call this an omnithermal approximation. These prob- 
lems appear in what are called doubly intractable posterior distributions arising 
in Bayesian analyses involving spatial point processes. They are usually dealt 
with indirectly using Markov chain Monte Carlo with auxiliary variables [12], 
but omnithermal approximation allows for a more direct approach. 

In the last section the Poisson point process P formed from the /3i values 
collected from k runs of TPA was introduced. To move from P to a Poisson 
process, set 

Np{t) =#{beP:b>/3B't}. 

As t advances from to (3b — (3b', Np{t) increases by 1 whenever it hits a /? 
value. By the theory of Poisson point processes, this happens at intervals that 
will be independent exponential random variables with rate k. 

Given Np{t), approximate ^i{B) / fi{A{(3)) by exp{Np{l3B - l3)/k). When 
(3 = /3b', this is just the approximation given earlier, so this generalizes the 
description of TPA from before. 

The key fact is that Np{t) — kt is a right continuous martingale. To bound the 
error in ex-p{Np(t)/k), it is necessary to bound the probability that Np{t) — kt 
has drifted too far away from 0. 
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Theorem 6.1. Let e > 0. Then for Np{-) a rate k Poisson process on [0, A], 
where e/A < 2.3; 



Pr sup 
V te[o,A] 



Np{t) 



>?|<2exp(-^(l-i 



Proof. The approach will be similar to finding a Chernoff bound [2]. Since 
exp(aa;) is convex for any positive constant a, and Np{t) is a right continuous 
martingale, exp{aNp{t)) is a right continuous submartingale. 

Let Au denote the event that {Np{t)/k) — t > e for some t £ [0, A]. Then 
for all a > 0: 

Pt{Au) — Pr sup exp(aiVp(t)) > exp(a/ci + ake) . 

\te[o,\] J 

It follows from basic Markov-type inequalities on right continuous submartin- 
gales (p. 13 of [9]) that this probability can be upper bounded as 

Pr{Au) < E(aexp(A^p(A))/exp(a/cA + aA:e)). 

Using the moment generating function for a Poisson with parameter fcA: 

E[exp(a^p(A))] = exp(A:A(exp(a) - 1)), 

which means 

Pr{Au) < exp(A(e" - 1 - a) + ae)^ 

A Taylor series expansion shows that e" — 1 — a < (q;^/2)(1 + a) as long as 
a e [0,2.31858...]. Set a ~ e/A. Simplifying the resulting upper bound yields 

Pr(A,)<exp(^--(^l-- 

The other tail can be dealt with in a similar fashion, yielding a bound 



Pr sup [iVp(a)/fc] -t<l \ < exp - — 
\te[o.A] j \ Z\ 

The union bound on the two tails then yields the theorem. □ 

Corollary 6.1. For e G (0,0.3), S G (0, 1), and ln(^(B)/^(B')) > 1, after 

k = 2{\n{fi{B)/fi{B')){3e-^ + e'^) \n{2/d) 

runs of TPA, the points obtained can be used to build an (e, S) omnithermal 
approximation. 
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Proof. In order for the final approximation to be within a multiplicative factor 
of 1 + e of the true result, the log of the approximation must be accurate to 
an additive term of ln(l + e). Let A = ln(/x(B)/^(B')), so fc = 2A(3e"i + 
e~^) ln(2/(5). To prove the corollary from the theorem, it suffices to show that 
2 exp(-2A(3e-i + e'^ ln(2/(5)[ln(l + e)]^{l - e/A)/(2A)) < S. After canceling the 
factors of A, and noting that when A> 1, 1— e/A< 1 — e, it suffices to show 
that (3e-i + e-2)(l - e)[ln(l + e)]^ > 1. This can be shown for e e (0, 0.3) by a 
Taylor series expansion. □ 

6.1 Example: Omnithermal approximation for the Ising model 

Consider the following model. The value of /? is drawn from a prior density 
/prior (•) on [0,cx)), and then the data (conditioned on /S) is drawn from the 
Ising model. This was used by Besag [1] as a model for agriculture wherein soil 
quality of adjacent plots was more likely to be similar. 

Given the data X, the posterior in the Bayesian analysis is the following 
density on j3 : 

exp(W£(X)) 

/post(&) OC /prior(&) . (6.1) 

The evidence for the model is the integral of the right hand side of (6.1) as b 
runs from to cx). This is only a one-dimensional integration, and so should be 
straightforward from a numerical perspective, except that Z(b) is unknown. 

Here is where the omnithermal approximation comes in: it gives an ap- 
proximation for Z{b) that is valid for all values of b at once. Any numerical 
integration technique can be used, and the final value for the evidence (not in- 
cluding error arising from the numerical method) will be within a factor of 1 -I- e 
of the true answer. 

Figure 1 presents two omnithermal approximations for log Zj^ generated us- 
ing this method on a small 4x4 square lattice. The top graph is the result 
of a single run of TPA from /3 — 2 down to /3 = 0. At each [3 value returned 
by TPA, the approximation drops by 1. The bottom graph is the result of 
[ln(4 • 10^)] = 16 runs of TPA. This run told us that Z2 < 217 with confidence 
1 — 10^^/2. Therefore, using e = 0.1, and 5 = 10^/2 in Theorem 6.1 shows that 
r = 330000 samples suffice for a (0.1, 10~^) omnithermal approximation. 



7 Conclusions and further work 

The strength of TPA is the generality of the procedure, but that same generality 
means that it is possible to do better in restricted circumstances. For instance, 
when f{x) falls into the class of Gibbs distributions, Stefankovic et al. [17] were 
able to give an 0{\n(Z)) algorithm for approximating Z, but the high constants 
involved in their algorithm make it solely of theoretical interest. (Here the O 
notation hides logarithmic factors.) TPA can be used in conjunction with their 



11 




Figure 1: Omnithermal approximations for the partition function of the Ising 
model on a 4 X 4 lattice 



algorithm [6] to build an 0{\n{Z)\n{\n{Z))) algorithm, and work continues to 
bring this running time down to 0(ln(Z)). 
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